Streaming Similarity Self-Join

机译：流式相似性自我加入

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We introduce and study the problem of computing the similarity self-join in astreaming context (SSSJ), where the input is an unbounded stream of itemsarriving continuously. The goal is to find all pairs of items in the streamwhose similarity is greater than a given threshold. The simplest formulation ofthe problem requires unbounded memory, and thus, it is intractable. To make theproblem feasible, we introduce the notion of time-dependent similarity: thesimilarity of two items decreases with the difference in their arrival time. Byleveraging the properties of this time-dependent similarity function, we designtwo algorithmic frameworks to solve the sssj problem. The first one, MiniBatch(MB), uses existing index-based filtering techniques for the static version ofthe problem, and combines them in a pipeline. The second framework, Streaming(STR), adds time filtering to the existing indexes, and integrates newtime-based bounds deeply in the working of the algorithms. We also introduce anew indexing technique (L2), which is based on an existing state-of-the-artindexing technique (L2AP), but is optimized for the streaming case. Extensiveexperiments show that the STR algorithm, when instantiated with the L2 index,is the most scalable option across a wide array of datasets and parameters.

机译：我们引入并研究了在流式上下文（SSSJ）中计算相似性自联接的问题，其中输入是连续到达的无限制项目流。目的是找到流中相似度大于给定阈值的所有项目对。问题的最简单表述需要无限的记忆，因此这是棘手的。为了使该问题可行，我们引入了时变相似性的概念：两个项目的相似性随它们到达时间的不同而降低。利用此时间相关的相似性函数的性质，我们设计了两个算法框架来解决sssj问题。第一个是MiniBatch（MB），它使用现有的基于索引的过滤技术来解决问题的静态版本，并将它们组合在管道中。第二个框架Streaming（STR）向现有索引添加时间过滤，并在算法的工作中深度集成了基于新时间的范围。我们还将介绍一种新的索引技术（L2），该技术基于现有的最新索引技术（L2AP），但针对流情况进行了优化。大量的实验表明，当使用L2索引实例化STR算法时，STR算法是跨各种数据集和参数的最具扩展性的选项。

著录项

作者
Morales, Gianmarco De Francisci; Gionis, Aristides;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Similarity Join and Similarity Self-Join Size Estimation in a Streaming Environment [J] . IEEE Transactions on Knowledge and Data Engineering . 2020,第4期

机译：流环境中的相似连接和相似自连接大小估计
2. Accelerating the similarity self-join using the GPU [J] . Michael Gowanlock, Ben Karsin Journal of Parallel and Distributed Computing . 2019,第Nova期

机译：使用GPU加速相似性自联接
3. Using similarity measures in prediction of changes in financial market stream data-Experimental approach [J] . Data & Knowledge Engineering . 2020,第Jana期

机译：在预测金融市场流量数据变化中使用相似性度量-实验方法
4. Streaming Similarity Self-Join [C] . Aristides Gionis, Aalto University International conference on very large data bases . 2016

机译：流相似自加入
5. An Automatic Similarity Detection Engine Between Sacred Texts Using Text Mining and Similarity Measures [D] . Qahl, Salha Hassan Muhammed. 2014

机译：使用文本挖掘和相似度度量的神圣文本之间的自动相似度检测引擎
6. Similarity of stream width distributions across headwater systems [O] . George H. Allen, Tamlin M. Pavelsky, Eric A. Barefoot, -1

机译：整个上游水源系统的河宽分布相似
7. Near-Duplicate Video Detection Based on an Approximate Similarity Self-Join Strategy [O] . da Silva, Henrique Batista, Patrocino Jr., Zenilton, Gravier, Guillaume, 2016

机译：基于近似相似自加入策略的近复制视频检测

Streaming Similarity Self-Join

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅